Analyzing Approximate Value Iteration Algorithms
نویسندگان
چکیده
In this paper, we consider the stochastic iterative counterpart of value iteration scheme wherein only noisy and possibly biased approximations Bellman operator are available. We call approximate (AVI) scheme. Neural networks often used as function approximators, in order to counter Bellman’s curse dimensionality. they operator. Because neural typically trained using sample data, errors biases may be introduced. The design AVI accounts for implementations with sampling errors. present verifiable sufficient conditions under which is stable (almost surely bounded) converges a fixed point To ensure stability AVI, three different yet related sets that based on existence an appropriate Lyapunov function. These function–based easily new literature. verifiability enhanced by fact recipe construction necessary also provided. show analysis can readily extended general case set-valued approximations. Finally, more circumstances, is, finding points contractive maps.
منابع مشابه
Topological Value Iteration Algorithms
Value iteration is a powerful yet inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, ILAO* and variants of RTDP are state-of-the-art ones. These methods use reachability anal...
متن کاملApproximate Value Iteration with Temporally Extended Actions
Temporally extended actions have proven useful for reinforcement learning, but their duration also makes them valuable for efficient planning. The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrated the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when pla...
متن کاملFeature-Discovering Approximate Value Iteration Methods
Sets of features in Markov decision processes can play a critical role in approximately representing value and in abstracting the state space. Selection of features is crucial to the success of a system and is most often conducted by a human. We study the problem of automatically selecting problem features, and propose and evaluate a simple approach reducing the problem of selecting a new featu...
متن کاملError Bounds for Approximate Value Iteration
Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value representations Vn are processed iteratively by Vn+1 = AT Vn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal...
متن کاملRestricted Value Iteration: Theory and Algorithms
Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics of Operations Research
سال: 2022
ISSN: ['0364-765X', '1526-5471']
DOI: https://doi.org/10.1287/moor.2021.1202